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_L_i Grid cells in the brain respond when an animal occupies a periodic lattice of 

"grid fields" during spatial navigation. The grid scale varies along the dorso- 

^h ventral axis of the entorhinal cortex. We propose that the grid system minimizes 

the number of neurons required to encode location with a given resolution. We 
derive several predictions that match recent experiments: (i) grid scales follow a 
geometric progression, (ii) the ratio between adjacent grid scales is y/e for ide- 
alized neurons, and robustly lies in the range 1.4-1.7 for realistic neurons, (iii) 
the scale ratio varies modestly within and between animals, (iv) the ratio between 
grid scale and individual grid field widths at that scale also lies in this range, (v) 
grid fields lie on a triangular lattice. The theory also predicts the optimal grids in 
one and three dimensions, and the total number of discrete scales. 



Introduction 

How does the brain represent space? Tolman (i) suggested that the brain must have an explicit neu- 
ral representation of physical space, a cognitive map, that supports higher brain functions such as 
navigation and path planning. The discovery of place cells in the rat hippocampus (2, 3) suggested 
one potential locus for this map. Place cells have spatially localized firing fields which reorganize 
dramatically when the environment changes (4). Another potential locus for the cognitive map 
of space has been uncovered in the main input to hippocampus, a structure known as the medial 
entorhinal cortex (MEC) (5, 6). When rats freely explore a two dimensional open environment, 
individual "grid cells" in the MEC display spatial firing fields that form a periodic triangular grid 



which tiles space (Fig. 1A). It is believed that grid fields provide relatively rigid coordinates on 
space based partly on self-motion and partly on environmental cues (7). Locally within the MEC, 
grid cells share the same orientation and periodicity, but vary randomly in phase (<5). The scale of 
grid fields varies systematically along the dorso-ventral axis of the MEC (Fig. 1A) (6, 8). 

How does the grid system represent spatial location and what function does the striking trian- 
gular lattice organization and systematic variation in grid scale serve? Here, we begin by assuming 
that grid cell scales are organized into discrete modules (8), and propose that the grid system fol- 
lows a principle of economy by minimizing the number of neurons required to achieve a given 
spatial resolution. Our hypothesis, together with general assumptions about tuning curve shape 
and decoding mechanism, predicts a geometric progression of grid scales. The theory further 
determines the mean ratio between scales, explains the triangular lattice structure of grid cell fir- 
ing maps, and makes several additional predictions that can be subjected to direct experimental 
test. For example, the theory predicts that the ratio of adjacent grid scales will be modestly vari- 
able within and between animals with a mean in the range 1.4 — 1.7 depending on the assumed 
decoding mechanism used by the brain. This prediction is quantitatively supported by recent ex- 
periments (8, 13). In a simple decoding scheme, the scale ratio in an n-dimensional environment is 
predicted to be close to y/e. We also estimate the total number of scales providing the spatial reso- 
lution necessary to support navigation over typical behavioral distances, and show that it compares 
favorably with estimates from recent experimental measurements (8). 



Results 

General grid coding in one dimension 

Consider a one dimensional grid system that develops when an animal runs on a linear track. 
Suppose that grid fields develop at a discrete set of periodicities Ai > A 2 > • • • > A m (Fig. 1A). 
We will refer to the population of grid cells sharing each periodicity Aj as one module. It will 
prove convenient to define "scale factors" r* = -r^-. Here Ai could be the length of the entire 
track and we do not assume any further relation between the Aj, such as a common scale ratio (i.e., 
in general r\ ^ r 2 ^ ■ ■ ■ ^ r m _i). Now let the widths of grid fields in each module be denoted 
hihi' " 'lm- Within any module, grid cells have a variety of spatial phases so that at least one cell 
may respond at any physical location (Fig. ID). To give uniform coverage of space, the number 
of grid cells n, at scale i should be proportional to \i/U - thus we write rii = dXi/U in terms of a 
"coverage factor" d that represents the number of grid fields overlapping each point in space. We 
assume that d is the same at each scale. In terms of these parameters, the total number of grid cells 

Grid cells with smaller scales provide more local spatial information than those with larger 
scales, owing to their smaller Zj. However, this increased resolution comes at a cost: the smaller 
periodicity Aj of these cells leads to increased ambiguity (Fig. 1C,E, Fig. 2A-D). In this paper, we 
study coding schemes in which information from grid cells with larger scales is used to resolve 
this ambiguity in the smaller scales, while the smaller scales provide improved local resolution 
(Fig. IE). In such a system, resolution may thus be improved by increasing the total number of 
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Figure 1: Representing place in the grid system. (A) Grid cells (small triangles) in the medial 
entorhinal cortex (MEC) respond when the animal is in a triangular lattice of physical locations 
(red circles; sometimes also called a "hexagonal lattice") (5, 6). The scale of periodicity (the 
"grid scale", Aj) and the size of the regions evoking a response (the "grid field width", If) vary 
systematically along the dorso- ventral axis of the MEC (6). (B) A simplified binary grid scheme 
for encoding location along a linear track. At each scale (Aj) there are two grid cells (red vs. 
blue firing fields). The periodicity and grid field widths are halved at each successive scale. (C) 
Decoding is ambiguous if the grid field width at scale i exceeds the grid periodicity at scale i + 1. 
E.g., if the grid fields marked in red respond at scales i and i + 1, the animal might be in either 
of the two marked locations. (D) We extend the binary code of panel B to the more realistic case 
of populations of noisy neurons with overlapping tuning curves. (E) The relationship between 
grid periodicity, Aj, and grid field width, U. In the winner-take-all case, decoded position will be 
ambiguous unless l t < \ i+1 , analogously to the situation depicted in panel C. 
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Figure 2: (A-D) Trade-off between precision and ambiguity in the Bayesian decoder. (A) In- 
formation about position given the responses of all grid cells at scales smaller than module % is 
summarized by the posterior Qi^i(x) (black curve), and the uncertainty in position is given by 
the standard deviation 8i-\. Grid cells in module i contribute the periodic posterior Pi(x) (green 
curve). (B) The updated posterior combining module i with all larger-scale modules is given by 
the product Qi(x) ~ Pi(x)Qi-i(x), and has the reduced uncertainty Si. (C) Precision is improved 
by increasing the scale factor, thereby narrowing the peaks of Pi(x). However, the periodicity 
shrinks as well, increasing ambiguity. (D) Posterior Qi(x) given by combining the modules shown 
in C. Ambiguity from the secondary peaks leads to an overall uncertainty Si larger than in B, de- 
spite the improved precision from the narrower central peak. There is thus an optimal scale factor 
somewhere between that in A, B and in C, D. (E) The optimal ratio r between adjacent scales in a 
hierarchical grid system in one dimension for a simple winner-take-all decoding model (blue curve, 
WTA) and a Bayesian decoder (red curve). Here N r is the number of neurons required to represent 
space with resolution R given a scaling ratio r, and N min is the number of neurons required at the 
optimum. In both decoding models, the ratio N r /N min is independent of resolution, R. For the 
winner-take- all model, N r oc r/lnr, as derived in the main text, and the curve for the Bayesian 
model is derived numerically as described in Supplemental Sec. 5. The winner-take-all model 
predicts that the minimum number of neurons is achieved for r = e ~ 2.7, while the Bayesian 
decoder predicts r ps 2.3. The minima of the two curves lie within each others' shallow basins. (F) 
Same as E, but in two dimensions with a triangular grid. The winner-take-all curve in this case is 
N r oc r 2 / ln(r 2 ) (see main text), and the minima occur at r = y/e ~ 1.65 for winner-take-all and 
r w 1.44 for the Bayesian case. The shallowness of the basins around these minima predicts that 
some variability of adjacent scale ratios is tolerable, both within and between animals. 



modules m. Alternatively, the field widths U may be made smaller relative to the periodicities A^; 
however, this necessitates using more neurons at each scale in order to maintain the same coverage 
d. Improving resolution by either mode therefore requires additional neurons. An efficient grid 
system will minimize the number of grid cells providing a fixed resolution R; we shall demonstrate 
how the parameters of the grid system, r«, k/\i, and m, should be chosen to achieve this optimal 
coding. We will characterize efficient grid systems in the context of two decoding methods at 
extremes of complexity. 

We first consider a decoder which consider the animal as localized within the grid field of the 
most responsive cell in each module (9, 10). Such a "winner-take-all" scheme is at one extreme of 
decoding complexity and could be easily implemented by neural circuits. Any decoder will have to 
threshold grid cell responses at the background noise level, so that the firing fields are effectively 
compact (Fig. ID). Grid cell recordings suggest that the firing fields are, indeed, compact (<5). The 
uncertainty in the animal's location at grid scale i is given by the grid field width l t . The smallest 
scale that can be resolved in this way is l m , we therefore define the resolution of the grid system 
as the ratio of the largest to the smallest scale, R\ = \\/l m - hi terms of scale factors r* = -£*-, we 
can write the resolution as Ri = TT"1 1 rt, where we also defined r m = ^ m . Unambiguous decoding 
requires that U < X i+1 (Fig. 1C,E), or, equivalently, y 1 > ?v To minimize N = d^ ii \/li, all the 
Y should be as small as possible; so this fixes j 1 = r*. Thus we are reduced to minimizing the sum 
N = d Y^hLi r 'i over me parameters r«, while fixing the product R — \\ i r, L . Because this problem 
is symmetric under permutation of the indices i, the optimal r, turn out to all be equal, allowing us 
to set Ti = r (Supplementary Material). Our optimization principle thus predicts a common scale 
ratio, giving a geometric progression of grid periodicities. The constraint on resolution then gives 
m = log r R, so that we seek to minimize N(r) — dr log r R with respect to r : the solution is r = e 
(Fig. 2E; details in Supplementary Material). Therefore, for each scale i, A« = e Aj+i and A« = e /«. 
Here we treated N and m as continuous variables - treating them as integers throughout leads to 
the same result through a more involved argument (Supplementary Material). The coverage factor 
d and the resolution R do not appear in the optimal ratio of scales. 

The brain might implement the simple decoding scheme above via a winner-take-all mecha- 
nism (9-11). But the brain is also capable of implementing far more complex decoders. Hence, 
we also consider a Bayesian decoding scheme that optimally combines information from all grid 
modules. In such a setting, an ideal decoder should construct the posterior probability distribution 
of the animal's location given the noisy responses of all grid cells. The population response at each 
scale i will give rise to a posterior over location P(x\i), which will have the same periodicity Aj 
as the individual grid cells' firing rates (Fig. 2A). The posterior given all m scales, Q m (x), will be 
given by the product Q m (x) = Mli r [L l P{x\i), assuming independent response noise across scales 
(Fig. 2B). Here J\f is a normalization factor. The animal's overall uncertainty about its position 
will then be related to the standard deviation 5 m of Q m (x), we therefore quantify resolution as 
R = \\/5 m . S m , and therefore R, will be a function of all the grid parameters (Supplementary Ma- 
terial). In this framework, ambiguity from too-small periodicity A« decreases resolution, as does 
imprecision from too-large field width U. We thus need not impose an a priori constraint on the 
minimum value of Aj, as we did in the winner-take-all case: minimizing neuron number while fix- 
ing resolution automatically resolves the tradeoff between precision and ambiguity (Fig. 2A-D). To 
calculate the resolution explicitly, we note that when the coverage factor d is very large, the distri- 



butions P{x\i) will be well-approximated by periodic arrays of Gaussians (even though individual 
tuning curves need not be Gaussian). We can then minimize the neuron number, fixing resolution, 
to obtain the optimal scale factor r ps 2.3: slightly smaller than, but close to the winner-take-all 
value, e (Fig. 2E; details in Supplementary Material). As before, the optimal scale factors are all 
equal so we again predict a geometric progression of scales. 

It is apparent from Fig. 2E that the minima for both the Bayesian decoder and the winner- 
take-all decoder are shallow, so that the scaling ratio r may lie anywhere within a basin around 
the optimum at the cost of a small number of additional neurons. Even though our two decoding 
strategies lie at extremes of complexity (one relying just on the most active cell at each scale and 
another optimally pooling information in the grid population) their respective "optimal intervals" 
substantially overlap. That these two very different models make overlapping predictions suggests 
that our theory is robust to variations in the detailed shape of grid cells' grid fields and the precise 
decoding model used to read their responses. Moreover, such considerations also suggest that these 
coding schemes have the capacity to tolerate developmental noise: different animals could develop 
grid systems with slightly different scaling ratios, without suffering a large loss in efficiency. 

General grid coding in two dimensions 

How do these results extend to two dimensions? Let A« be the distance between neighboring 
peaks of grid fields of width U (Fig. 1A). Assume in addition that a given cell responds on a 
lattice whose vertices are located at the points Aj(nu + mv), where n, m are integers and u, v are 
linearly independent vectors generating the lattice (Fig. 3B). We may take u to have unit length 
(|u| = 1) without loss of generality, however |v| 7^ 1 in general. It will prove convenient to denote 
the components of v parallel and perpendicular to u by v\\ and v±, respectively (Fig. 3B). The 
two numbers vn,v± quantify the geometry of the grid and are additional parameters that we may 
optimize over: this is a primary difference from the one-dimensional case. We will assume that 
v\\ and v± are independent of scale; this still allows for relative rotation between grids at different 
scales. 

At each scale, grid cells have different phases so that at least one cell responds at each physical 
location. The minimal number of phases required to cover space is computed by dividing the area 
of the unit cell of the grid (Af ||u x v|| = Af |t>j_|) by the area of the grid field. As in the one- 
dimensional case, we define a coverage factor d as the number of neurons covering each point in 
space, giving for the total number of neurons iV = d\v±\ J2i(K/h) 2 - 

As before, consider a simple model where grid fields lie completely within compact regions 
and assume a decoder which selects the most activated cell (9-11). In such a model, each scale i 
serves to localize the animal within a circle of diameter l L . The spatial resolution is summarized 
by the square of the ratio of the largest scale Ai to the smallest scale l m : R 2 = (\i/l m ) 2 . In terms 
of the scale factors fj = Aj/A i+1 we write R 2 = Y\T=i ^h where we also define f m = X m /l m . To 
decode the position of an animal unambiguously, each cell at scale i should have at most one grid 
field within a region of diameter /j_i. Since the nearest firing fields lie at a distance A, along the 
three grid axes u, v, andu— v, we require min(|v|, |u— v|, 1)-Aj > U-\ in order to avoid ambiguity 
(Fig. 3C). To minimize N we must make Aj_i/^_i = fi-iXi/k-i as small as possible, so that A« = 
/j_i, which is only possible if |v| > 1, |u — v| > 1. We then have N = d\v±\ ^ ff. We now seek 
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Figure 3: (A) Two dimensional analog of a grid scheme with circular firing fields. (B) A general 
two-dimensional lattice may be parameterized by two vectors u and v and a periodicity parameter 
Aj. We take u to be a unit vector, so that the spacing between peaks along the u direction is 
Aj, and denote the two components of v by v\\, v±. The blue -bordered region is a fundamental 
domain of the lattice, the largest spatial region that may be unambiguously represented. (C) The 
two dimensional analog of the ambiguity in Fig. 1C, E for the winner-take- all decoder. If the 
grid fields in scale i are too close to each other relative to the size of the grid field of scale i — 1 
(i.e. k-i), the animal might be in one of several locations. (D) Contour plot of normalized neuron 
number N/N min in the Bayesian decoder, as a function of the grid geometry parameters v±,v\\ after 
minimizing over the scale factors for fixed resolution R. As in Fig. 2E,F, the normalized neuron 
number is independent of R. The spacing between contours is 0.01, and the asterisk labels the 
minimum at v\\ = 1/2, v± = v^/2; this corresponds to the triangular lattice. 



parameters vn,v±,fi that minimize N while fixing the resolution R 2 . Since R 2 does not depend 
on the geometric parameters v»,v±, we may determine these parameters by simply minimizing 
N, which is equivalent to minimizing \v±\ subject to the constraints |v| > 1, |u — v| > 1. This 
optimization picks out the triangular lattice with v± = y/S/2, v\\ = 1/2. Note that this formulation 
is mathematically analogous to the optimal sphere-packing problem, for which the solution in 
two dimensions is also the triangular lattice (22). As for the scale factors fi, the optimization 
problem is mathematically the same as in one dimension if we formally set r^ = ff. This gives 
the optimal ratio ff — e for all i (Fig. 2F). We conclude that in two dimensions, the optimal ratio 
of neighboring grid periodicities is \fe ~ 1.65 for the simple winner-take-all decoding model, and 
the optimal lattice is triangular. 

The Bayesian decoding model can also be extended to two dimensions with the posterior distri- 
butions P(x\i) becoming sums of Gaussians with peaks on the two-dimensional lattice. In analogy 
with the one-dimensional case, we then derive a formula for the resolution R 2 = Xi/S m in terms 
of the standard deviation 5 m of the posterior given all scales. S m may be explicitly calculated as 
a function of the scale factors f\ and the geometric factors v\\,v±, and the minimization of neu- 
ron number may then be carried out numerically (Supplementary Material). In this approach the 
optimal scale factor turns out to be r\ rs 1.4 (Fig. 2F), and the optimal lattice is again triangular 
(Fig. 3D). 

Once again, the optimal scale factors in both decoding approaches lie within overlapping shal- 
low basins, indicating that our proposal is robust to variations in grid field shape and to the precise 
decoding algorithm (Fig. 2F). In two dimensions, the required neuron number will be no more 
than 5% of the minimum if the scale factor is within (1.43, 1.96) for the winner-take-all model 
and (1.28, 1.66) for the Bayesian model. These "optimal intervals" are narrower than in the one- 
dimensional case, and have substantial overlap. 

The fact that both of our decoding models predicted the triangular lattice as optimal is a con- 
sequence of the fact that they share a very general symmetry. The resolution formula in both 
problems is invariant under a common rotation and a common rescaling of all firing rate maps. 
The neuron number shares this symmetry, as well. The rotation invariance implies that the reso- 
lution only depends on grid geometry through v±,vn, and the rescaling invariance implies that it 
only depends on A*, k through the dimensionless ratios r, h \i/U. However, even after restricting 
the parameters in this way, the rotation- and rescaling-invariance has a nontrivial consequence. 
The transformation v± — > — f_i_/|v| 2 ,f|| — > t>||/|v| 2 , Zj —¥ h/\v\ can be seen to be equivalent to 
a rotation of the grid combined with a scaling by |v| (Supplementary Material), and therefore 
must leave the resolution and neuron number invariant. If there is a unique optimal grid, it must 
then also be invariant under this transformation: this constraint is only satisfied by the square grid 
(v± = l,vu = 0) and the triangular grid (v± = \/3/2,v\\ = 1/2). Between these two, the triangular 
grid has the smaller v± and so will minimize neuron number (see Supplementary Material for a 
more rigorous discussion). We therefore see that the optimality of the triangular lattice is a very 
general consequence of minimizing neuron number for fixed resolution, and expect the result to 
hold for a wide range of decoders. 
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Figure 4: (A) Our models predict grid scaling ratios that are consistent with experiment. 'WTA' 
(Winner- Take- All) and 'Bayesian' represent predictions from two decoding models; the dot is the 
scaling ratio minimizing neuron number and the error bars represent the interval within which the 
neuron number will be no more than 5% higher than the minimum. For the experimental data, the 
dot represents the mean measured scale ratio and the error bars represent ± one standard deviation. 
Data were replotted from (8, 13). The dashed red line shows a consensus value running through 
the two theoretical predictions and the two experimental datasets. (B) The mean ratio between grid 
periodicity (Aj) and the diameter of grid fields (Zj) in mice (replotted from (14)). Error bars indicate 
± one S.E.M. For both wild type mice and HCN knockouts (which have larger grid periodicities) 
the ratio is consistent with y/e (dashed red line). (C) The response lattice of grid cells in rats forms 
an equilateral triangular lattice with 60° angles between adjacent lattice edges (replotted from (6), 
n = 45 neurons from 6 rats). Dots represent the outliers. 



Comparison to experiment 

Our predictions agree with experiment (8, 13, 14) (see Supplementary Material for details of the 
data re-analysis). Specifically, Barry et al., 2007 (Fig. 4A) reported the grid periodicities measured 
at three locations along the dorso-ventral axis of of the MEC in rats and found ratios of ~ 1, ~ 1.7 
and ~ 2.5 ~ 1.6 x 1.6 relative to the smallest period (13) . The ratios of adjacent scales reported 
in (13) had a mean of 1.64 ± 0.09 (mean ± std. dev., n = 6), which almost precisely matches the 
mean scale factor of \fe predicted from the winner-take-all decoding model, and is also consistent 
with the Bayesian decoding model. Recent analysis based on larger data set (8) confirms the 
geometric progression of the grid scales. The mean adjacent scale ratio is 1.42 ± 0.17 (mean ± std. 
dev., n = 24) in that data set, accompanied by modest variability of the scaling factors both within 
and between animals. These measurements again match both our models (Fig. 4A). The optimal 
grid was triangular in both of our models, this again matches measurements (Fig. 4C) (6-8). 

The winner-take-all model also predicts the ratio between grid period and grid field width: 
\/h = Aj/Aj+i = y/e ~ 1.65. A recent study measured the ratio between grid periodicity and 
grid field size to be 1.63 ± 0.035 (mean ± S.E.M., n = 48) in wild type mice (14), consistent with 
our predictions (Fig. 4B). This ratio was unchanged, 1.66 ± 0.03 (mean ± S.E.M., n = 86), in 
HCN1 knockout strains whose absolute grid periodicities increased relative to the wild type (14). 
The Bayesian model does not make a direct prediction about grid field width; it instead works 
with the standard deviation of the posterior P(x \ i), <Tj (Supplementary Material). This parameter 
is predicted to be Oi = 0.19Aj in two dimensions, but cannot be directly measured from data. It 
is related to the field width Zj by a proportionality factor whose value depends on detailed tuning 
curve shape, noise properties, firing rate, and firing field density (Supplementary Material). 

We can estimate the total number of modules, m, by estimating the requisite resolution R 2 and 
using the relationship m = logi?2/logf 2 . Assuming that the animal must be able to navigate 
an environment of area ~ (10 m) 2 , with a positional accuracy on the scale of the rat's body size, 
~ (10 cm) 2 , we get a resolution of R 2 ~ 10 4 . Together with the predicted two-dimensional scale 
factor f, this gives m ps 10 as an order-of-magnitude estimate. Indeed, in (8), 4-5 modules were 
discovered in recordings spanning up to 50% of the dorsoventral extent of MEC; extrapolation 
gives a total module number consistent with our estimate. 



Discussion 

We have shown that a grid system with a discrete set of periodicities, as found in the entorhinal 
cortex, should use a common scale factor r between modules to represent spatial location with the 
fewest neurons. In one dimension, this organization may be thought of intuitively as implementing 
a neural analog of a base-6 number system. Each scale localizes the animal to some coarse region of 
the environment, and the next scale subdivides that region into b = r "bins" (Fig. 1C). Our problem 
of minimizing neuron number while fixing resolution is analogous to minimizing the number of 
symbols needed to represent a given range R of numbers in a base-6 number system. Specifically, 
b symbols are required at each of log b R positions, and minimizing the total, b log fe R, with respect 
to b gives an optimal base b = e. Our full theory can thus be seen as a generalization of this simple 
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fixed-base representational scheme to noisy neurons encoding two-dimensional location. 

The existing data agree with our predictions for the ratios of adjacent scales within the variabil- 
ity tolerated by our models (Fig. 4). Further tests of our theory are possible. For example, a direct 
generalization of our reasoning says that in n-dimensions the optimal ratio between grid scales 
will be near y/e, with n = 3 having possible relevance to the grid system (15) in, e.g., bats (16). 
In general, the theory can be tested by comprehensive population recordings of grid cells along 
the dorso-ventral axis for animals moving in one, two and three dimensional environments. There 
is some evidence that humans also have a grid system (17), in which case our theory may have 
relevance to the human sense of place. 

We assumed that the grid system should minimize the number of neurons required to achieve 
a given spatial resolution. In fact, any cost which increases monotonically with the number of 
neurons would lead to the same optimum. Of course, completely different proposals for the func- 
tional architecture of the grid system (18, 19, 23)and associated cost functions will lead to different 
predictions. For example, (18, 19) showed that a grid implementing a "residue number system" 
(in which adjacent grid scales should be relatively prime) will maximize the range of positions 
that can be encoded. This theory makes distinct predictions for the ratios of adjacent scales (the 
different periods are relatively prime) and, in its original form, predicts neither the ratio of grid 
field width to periodicity nor the organization in higher dimensions, except perhaps by interpreting 
higher dimensional grid fields as a product of one-dimensional fields. The essential difference be- 
tween these two theories lies in the fundamental assumptions: we minimize the number of neurons 
needed to represent space with a given resolution and range, as opposed to maximizing the range 
of locations that may be uniquely encoded. 

Grid coding schemes represent position more accurately than place cell codes given a fixed 
number of neurons (20, 21). Furthermore, in one dimension a geometric progression of grids that 
are self-similar at each scale minimizes the asymptotic error in recovering an animal's location 
given a fixed number of neurons (20). The two dimensional grid schemes discussed in this paper 
will share the same virtue. 

The scheme that we propose may also be more developmentally plausible, as each scale is 
determined by applying a fixed rule (rescaling by r) to the anatomically adjacent scale. This could 
be encoded, for example, by a morphogen with an exponentially decaying concentration gradient 
along the dorsoventral axis, something readily attainable in standard models of development. This 
differs from the global constraint that all scales be relatively prime, for which the existence of a 
local developmental rule is less plausible. As we showed, the location coding scheme that we have 
described is also robust to variations in the precise value of the scale ratio r, and so would tolerate 
variability within and between animals. 
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Supplementary materials 
1 Optimizing a "base-b" representation of one-dimensional space 

Suppose that we want to resolve location with a precision I in a track of length L. In terms of the resolution 
R = L/l, we have argued in the discussion of the main text that a "base-b" hierarchical neural coding 
scheme will roughly require N = b log fe R neurons. To derive the optimal base (i.e. the base that minimizes 
the number of the neurons), we evaluate the extremum dN/db = 0: 

d(b \og b R) d(±%P) lnb- 1 

8b db {\nbf 

Setting dN/db = gives In b — 1 = 0. Therefore the number of neurons is extremized when b = e. It is 
easy to check that this is a minimum. 



dN/db = ^^ = ^^ = In* ^ (1) 



2 Optimizing the grid system: winner- take-all decoder 
2.1 Lagrange multiplier approach 

We saw in the main text that, for a winner-take-all decoder, the problem of deriving the optimal ratios of adja- 
cent grid scales in one dimension is equivalent to minimizing the sum of a set of numbers (N = d Y^iLi r i) 
while fixing the product (Ri = YYiLi r «) to ta ^ e tne fixed value R. Mathematically, it is equivalent to 
minimize N while fixing InR. When N is large we can treat it as a continuous variable and use the 
method of Lagrange multipliers as follows. First, we construct the auxiliary function H(r\ • • -vn,P) = 
N — /3 (In R\ — In R) and then extremize H with respect to each r^ and f3. Extremizing with respect to r.; 
gives 

-— = d = =>• n = - = r. (2) 

dri ri d 

Next, extremizing with respect to /3 to implement the constraint on the resolution gives 

dff 

— - =lnR 1 -lnR = mlnr-lnR = =^ r = R 1/m (3) 

dp 

Having thus implemented the constraint that InRi = InR , it follows that H = N = dmR l l m . Alterna- 
tively, solving for m in terms of r, we can write H = d r (In R) / In r) = d r log r R. It remains to minimize 
the number of cells N with respect to r, 

2" 



9 ^=dlnR 
dr 



±- - 

In r V In r 



=> lnr=l (4) 



This is in turn implies our result 

r = e (5) 

for the optimal ratio between adjacent scales in a hierarchical, grid coding scheme for position in one 
dimension, using a winner-take-all decoder. In this argument we employed the sleight of hand that N and 
m can be treated as continuous variables, which is approximately valid when N is large. This condition 
obtains if the required resolution R is large. A more careful argument is given below that preserves the 
integer character of N and m. 
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2.2 Integer N and m 

As discussed above, we seek to minimize the sum of a set of numbers (N = d Yl^Li r i) while fixing the 
product (R = YliLi r i) to ta ^ e a fixed value. We wish to carry out this minimization while recognizing that 
the number of neurons is an integer. First, consider the arithmetic mean-geometric mean inequality which 
states that, for a set of non-negative real numbers, xi, x%, ..., x m , the following holds: 

(xi + x 2 + ... + x m )/m > (x 1 x 2 ...x m ) 1/m , (6) 



with equality if and only if all the Xi's are equal. Applying this inequality, it is easy to see that to minimize 
Y^iLi r i> au of the r{ should be equal. We denote this common value as r, and we can write r = R x l m . 

Therefore, we have 

m 

N = d^r = mdR 1/m (7) 

Suppose R = e z+e , where z is an integer, and e G [0, 1). By taking the first derivative of N with respect to 
m, and setting it to zero, we find that N is minimized when m = z + e. However, since m is an integer the 
minimum will be achieved either at m = z or m = z + 1. (Here we used the fact mR l l m is monotonically 
increasing between and z + e and is monotonically decreasing between z + e and oo.) Thus, minimizing 
N requires either 

r=(e 2+£ )J =e^ or r = (e z+e )^+i = e^+i . (8) 

In either case, when z is large (and therefore R, N and m are large), r — > e. This shows that when the 
resolution R is sufficiently large, the total number of neurons N is minimized when n ~ e for all i. 



3 Optimizing the grid system: Bayesian decoder 

3.1 Neuron number and resolution 

In the main text we argued that the optimal scale factor in one dimension is r = e assuming that decoding is 
based on the responses of the most active cell at each scale. However, the decoding strategy could use more 
information from the population of neurons. Thus, we consider a Bayes-optimal decoder that accounts for 
all available information by forming a posterior distribution of position, given the activity of all grid cells 
in the population. We can make quantitative predictions in this general setting if we assume that the firing 
of different grid cells is statistically independent and that the tuning curves at each scale i provide dense, 
uniform, coverage of the interval A«. With these assumptions, the posterior distribution of the animal's 
position, given the activity of grid cells at the single scale i, P(x \ i), may be approximated by a series of 
Gaussian bumps of standard deviation ai spaced at the period \. Furthermore, o-i = cd~ l < 2 k, where k is the 
width of each tuning curve, c is a dimensionless factor incorporating the tuning curve shape and noisiness of 
single neurons, and d is the coverage factor. The linear dependence on li follows from dimensional analysis. 
From the definition of d given in the main text, d = riij-., we see th at d can be interpreted as the number 
of cells with tuning curves overlapping a given point in space. The square-root dependence of oi on d then 
follows, as this is the effective number of neurons independently encoding position. We assume here that d 
is large; this is necessary for the Gaussian approximation to hold. Finally, combining the equation for a with 
the relationship, n, = d-j 1 , gives n, = c\fd^. Therefore, the total number of neurons, which we would like 

to minimize, is N = cVdYJ-i —■ 

15 



X 



CL 




Figure 5: p max = iaax a /s p(-, f ) is the scaling factor after optimizing N over a/5. The values r* 
and A* are the values chosen by the complete optimization procedure. 



In the main text, we minimized N while fixing the resolution Ri. In our present Bayesian decoding 
model, R\ will be related to the standard deviation 5 m of the distribution of location x given the activity 
of all m scales, Q m {x). In general, the activity of the grid cells at all scales larger than Aj provides a 
distribution over position Qi-i(x) which is combined with the posterior P(x \ i) to find the distribution 
Qi(x) given all scales 1 to i. Since we assume independence across scales, Qi-i(x) is obtained by taking 
the product over all the posteriors up to scale i — 1: Qi-i(x) = M r]i=i P( x I i)> where M normalizes the 
distribution. Furthermore, Qi(x) = J\P P(x \ i) Qi-i(x). The posteriors from different scales have different 
periodicities, so multiplying them against each other will tend to suppress all peaks except the central one, 
which is aligned across scales. We may thus approximate Qi^i(x) and Qi(x) by single Gaussians whose 
standard deviations we will d enot e as <5j_i and 5i, respectively. The validity of this approximation is taken 

below. By dimensional analysis, 5i = 5i_i/p(^ i , j^-)- 



3.2 



With the stated 



up in further detail in section 

Gaussianity assumptions, the function p may be explicitly defined and evaluated numerically (section 1X2]) 
A Bayes-optimal decoder will then estimate the animal's position with error proportional to the posterior 
standard deviation over all m scales, 5 m = (]T ■ Pi)~ 0"i> and no unbiased decoder can do better than this. 
(We are abbreviating pi = p{\i/&i, <7j/<5i_i.) Thus, the resolution constraint imposed in the main text 
becomes, in the present context, a constraint on \\ t pi. We will show below that p is in fact equal to the scale 
factor ri = Aj/Aj+i. 

Thus, we would like to minimize N = c^fdYllLi ^ subject to a constraint R = \\ i p(^ i , j^ - )- The 
minimization is with respect to the parameters Aj/ctj and (Ji/5i-\. We perform the calculation in two steps: 
first optimizing over cjj/(5j_i, then over Aj/ctj. The former parameter only affects N indirectly, by changing 
the number of scales m through the constraint \\™ =1 p{^r, *?*-)• Choosing Oi/bi-\ to maximize p will 
minimize m, and therefore N. We thus replace p by /9 max (A/cr) = max^ p(X/a, a/ 5) and minimize N 
over the remaining parameters Aj/aj. As in the main text, the problem has a symmetry under permutations 
of the i, so the optimal Aj/ctj and cjj/5j_i are independent of i. Thus, m = \n Rlhi p max and N oc 



A/o- 



l, and minimize N 



-, r^—~ . We can invert the one-to-one relationship between p max and X/a (Fig. 

in Pmax {/*■/& ) 

over pmax to get p max = 2.3. In fact, p is equal to the scale factor: p% = r^ = Aj/Aj+i! To see this, express 



Pi as a product: pi 



-— -—— -y y — — j^. Since the factors Oi/5i-\ and Xi/a-i are independent of 

i, they cancel in the product and we are left with pi = Aj/Ai+i. 
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We have thus seen that the Bayesian decoder predicts an optimal scaling factor r* = 2.3 in one dimen- 
sion. This is similar to, but somewhat different than, the winner-take-all result r* = e = 2.7. At a technical 
level the difference arises from the fact that the function p ma x(^/o') does not satisfy p ma x = ^ as used 
previously, but is instead more nearly approximated by a linear function with an offset: p th oT 1 (^ + )3). A 
more conceptual reason for the difference is that the Gaussian posterior used here has long tails which are 
absent in the case with compact firing fields. The scale factor must then be smaller to keep the ambiguous 
secondary peaks of the next scale far enough into the tails to be adequately suppressed. The optimization 
also predicts A* = 9.1 a, which may be combined with the formula a = cdr x l 2 l to predict l/X. However, 
this relationship depends on the parameters c and d which may only be calculated from a more detailed 
description of the single neuron response properties. For this reason, the general Bayesian analysis above 
does not predict the ratio of the grid periodicity to the width of individual grid fields. Note that A* = 9.1 a 
also implies that erj/Aj+i ks 4 - i.e. that the peaks of the posterior distribution at scale i + 1 are separated 
by 4 of the standard deviations of the peaks at scale i. 

A similar Bayesian analysis can be carried out for two dimensional grid fields. The posteriors P(x \ i) 
become two-dimensional sums-of-Gaussians, with the centers of the Gaussians laid out on the vertices of 
the grid. Qi{x) is then similarly approximated by a two-dimensional Gaussian. The form of the function p 
changes (section [3T2] ), but the logic of the above derivation is otherwise unaltered. 

3.2 Calculating p(£, f) 

Section 



3.1 



argued that the function p( ^ , % ) can be computed by making the approximation that the posterior 
distribution of the animal's position given the activity at a single scale i, P(x | i), is a periodic sum-of- 
Gaussians: 

1 J^ 1 -^(x-nXA 2 

«*\Q-W+i E -T=/" J (9) 



2K + 1 ^ L 2 

n=-K \ 2n<jf 



where K is assumed is large. We further approximate the posterior given the activity of all scales coarser 
than Aj by a Gaussian with standard deviation 5i-\\ 

Qi-i(x) = . 1 e-* 2 / 2 *?-! (10) 



Assuming independence across scales, it then follows that Qt(x) = p\ \-\n \ y Then p(Aj/cri, (Ti/Si-i) 
is given by 5i-i/S{, where 5i is the standard deviation of Qi. We therefore must calculate Qi(x) and its vari- 
ance in order to obtain p. After some algebraic manipulation, we find, 



A' 



Qi(x)= J2 it n -jL=e-"- "■■'■--. (11) 



-(:r- Mn ) 2 /2E 2 



n=~K 

2 



where S 2 = [a i 2 + 5^) \ p n = ( Jm \ n, and 



nn= l-n*X>/2tf + 6U). ( i 2) 

Z is a normalization factor enforcing Yin n n = 1- Qi is thus a mixture-of-Gaussians, seemingly contra- 
dicting our approximation that all the Q are Gaussian. However, if the secondary peaks of P{x \ i) are well 
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into the tails of Qi_i(x), then they will be suppressed (quantitatively, if A 2 3> a\ + Sf_ v then ir n <C 7ro 
for \n\ > 1), so that our assumed Gaussian form for Q holds to a good approximation. In particular, at the 



values of A, a, and 5 selected by the optimization procedure described in section 3.1 m = 1.3 • 10 3 7To- So 
our approximation is self-consistent. 

Next, we find the variance of: 



We can finally read off p( 



Xi 



(x 2 )Q, 




5>„(E 2 + /£) 




4+(i)W?" c 


i 


^+f)>(f) 2 i 




^-) as the ratio <5i_i/<5.;: 




(-t) I/2 (-(-a 


-1 



J> 2 vr n 1. (13) 



^^) = fi+% 1 l fi+fi+^-V f-) >y<-.„i 



O"; 



2 \ "1/2 



For the calculations reported in the text, we took if = 500. 
Section 



3.1 



explained that we are interested in maximizing p over ? , holding - fixed. The first factor 
in p increases monotonically with decreasing %; however, ^ n n 2 ir n also increases and this has the effect of 
reducing p. The optimal ? is thus controlled by a tradeoff between these factors. The first factor is related 
to the increasing precision given by narrowing the central peak of P(x \ i), while the second factor describes 
the ambiguity from multiple peaks. 

The derivation can be repeated in the two-dimensional case. We take P(x \ i) to be a sum-of-Gaussians 
with peaks centered on the vertices of a regular lattice generated by the vectors (AjU, Xiv). We also define 
5f = \{\x\ 2 )q x . The factor of 1/2 ensures that the variance so defined is measured as an average over the 
two dimensions of space. The derivation is otherwise parallel to the above, and the result is, 

''4£ ) K 1+ l 1 ) 1/2 ( 2+ ( 1+ l;)"© 2 £ l " i+ '™ 12 ^" I/2 • <15) 

where 7r n , m = ^ e -l^+^T^/ 2 K ? +d). 



4 Reanalysis of grid data from previous studies 

We reanalyzed the data from Barry et. al (13) and Stensola et. al (8) in order to get the mean and the variance 
of the ratio of adjacent grid scales. For Barry et. al (13), we first read the raw data from Figure 3b of the 
main text using the software GraphClick, which allows retrieval of the original (x,y)-coordinates from the 
image. This gave the scales of grid cells recorded from 6 different rats. For each animal, we grouped the 
grids that had similar periodicities (i.e. differed by less than 20%) and calculated the mean periodicity for 
each group. We defined this mean periodicity as the scale of each group. For 4 out of 6 rats, there were 2 
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scales in the data. For 1 out 6 rats, there were 3 grid scales. For the remaining rat, only 1 scale was obtained 
as only 1 cell was recorded from that rat. We excluded this rat from further analysis. We then calculated the 
ratio between adjacent grid scales, resulting in 6 ratios from 5 rats. The mean and variance of the ratio were 
1.64 and 0.09, respectively (n = 6). 

For Stensola et. al (8), we first read in the data using GraphClick from Figure 5d of the main text. This 
gave the scale ratios between different grids for 16 different rats. We then pooled all the ratios together 
and calculated the mean and variance. The mean and variance of the ratio were 1.42 and 0.17, respectively 
(n = 24). 

Giocomo et. al (14) reported the ratios between the grid period and the radius of grid field (measured as 
the radius of the circle around the center field of the autocorrelation map of the grid cells ) to be 3.26 ± 0.07 
and 3. 32 ±0.06 for Wild-type and HCN KO mice, respecitvely. We linearly transform these measurements to 
the ratios between grid period and the diameter of the grid field to facilitate the comparison to our theoretical 
predictions. The results are plotted in a bar graph (Fig. 4B in the main text). 

Finally, in Figure 4C, we replotted Fig. lc from (6) by reading in the data using GraphClick and then 
translating that information back into a plot. 



5 General optimality of the triangular lattice 

Ourtaskis to minimize the number of neurons in a population made up of m modules, N = dY^Li \v±\(j L ) 2 , 
subject to a constraint on resolution R = F({\, I, u, v}, m). The specific form of the resolution function F 
will, of course, depend on the details of tuning curve shape, noise, and decoder performance. Nevertheless, 
we will prove that the triangular lattice is optimal in all models sharing the following general properties: 

• Uniqueness: Our optimization problem has a unique solution for all R. The optimal parameters are 
continuous functions of R. 

• Symmetry: Simultaneous rotation of all firing rate maps leaves F invariant. Likewise, F is invariant 
under simultaneous rescaling of all maps. These transformations are manifestly symmetries of the 
neuron number N. Rotation invariance implies that F depends on u and v only through the two scalar 
parameters v± and v» (the components of v orthogonal to and parallel to u, respectively). Scale in- 
variance implies that the dependence on the dimensionful parameters {A, 1} is only through the ratios 
{r, A//}, where n = Aj/Aj+i are the scale factors. The resolution formulas in both the winner-take- 
all and the Bayesian formulations are evidently scale-invariant, as they depend only on dimensionless 
ratios of grid parameters. We will also assume that firing fields are circularly- symmetric. 

• Asymptotics: The resolution F({r, A//}, v\\ ,v±,m) increases monotonically with each Aj/Zj. When 
all Xi/k — > oo, the grid cells are effectively place cells and so the grid geometry cannot matter. 
Therefore, F becomes independent of v in this limit. 

We will first argue that the uniqueness and symmetry properties imply that the optimal lattice can only 
be square or triangular. The asymptotic condition then picks out the triangular grid as the better of these two. 
To see the implications of the symmetry condition, consider the following transformation of the parameters: 

v± — > — v±/\v\ 

v» — > ^||/|v| 2 
k ->■ Zj/|v| 
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This takes the vector v, reflects it through u (keeping the same angle with u), and scales it to have length 
1/| v|. This new v, together with u, thus generates the same lattice as the original u and v, but rotated, 
scaled, and with the roles of u and v exchanged. We then also scale all field width parameters by the 
same factor l/|v| to compensate for the stretching of the lattice. And although this is a rotation of the 
lattice and not the firing fields, our assumed isotropy of the firing fields implies that the transformation is 
indistinguishable from a rotation of the entire rate map. Since the overall transformation is equivalent to a 
common rotation and scaling of all rate maps, it will (by our symmetry assumption) leave the neuron number 
and resolution unchanged. If the optimal lattice is unique, it must then be invariant under this transformation. 

Which lattices are invariant under the above transformation? It must take the generator v to another 
generator v' of the same lattice. This requirement demands that the generators are related by a modular 
transformation: 

v' = av + 6u 
u = cv + du, 

with a, b, c, d integers such that \ad — bc\ = 1. The second equation, and linear independence of u and v, 
require c = 0, d = 1 and so \a\ = 1. Plugging in our transformation of v, the first equation then gives 
a = — 1, |v| = 1 and v» = 6/2. Since v + nu will generate the same lattice as v, for any integer n, we may 
assume < v» < 1. The only solutions are the square lattice with v\\ = 0, v± = 1 and the triangular lattice 

within = 1/2, v± = \/3/2. 

It remains to choose between these two possibilities. We want to minimize N = d^2 t \v±_\(j L ) 2 , so it 
seems that we should minimize \v±\, giving the triangular lattice. However, the constraint on resolution will 
introduce v— dependence into X/l, so it is not immediately clear that we can minimize N by minimizing 
\v±\ alone. But the asymptotic condition implies the existence of a large-/? regime tied to large X/l, and 
asserts that in this limit the v-dependence drops out. Therefore, the triangular lattice is optimal for large 
enough R. Since the only other possible optimum is the square lattice, and our uniqueness assumption 
prevents the solution from changing discontinuously as R is lowered, it must be the case that the triangular 
lattice is optimal for all R. 
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